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ABSTRACT 



A data processing system includes an interconnect, a plu- 
rality of nodes coupled to the interconnect that each include 
at least one agent, response logic within each node, and a 
queue. In response to snooping a transaction on the 
interconnect, each agent outputs a snoop response. In 
addition, the queue, which has an associated agent, allocates 
an entry to service the transaction. The response logic within 
each node accumulates a partial combined response of its 
node and any preceding node until a complete combined 
response for all of the plurality of nodes is obtained. 
However, prior to the associated agent receiving the com- 
plete combined response, the queue speculatively deallo- 
cates the entry if the partial combined response indicates that 
an agent other than the associated agent will service the 
transaction. 

14 Claims, 6 Drawing Sheets 



r r- 



^.y j MEMORY I j | MEMORY | ^^y j MEMORY | | MEMORY h ^g 2 22-^^^^ | MEMORY 2 

iO ^CA^E CA^E I 2qJ | CA^HE | | CA^E K 2O CACTE | | CA^HE K 2O 



COMBINED RESPONSE ICR) 



PARTIAL COMBINED 



RESPONSE (PCR> 



DATA rK ADDRESS 



I B ADDRESS 
ADDRESS 



idk 



10/21/2003, EAST Version: 1.4.1 



us 6,591^07 Bl 

Page 2 





U.S. PATENT DOCUMENTS 


6,009,456 


A 




12/1999 


Frew et al. 










6 nOQ 479 




* 


12/1999 






A 


♦ 7/1998 


Deshpande 


6,011,777 


A 




1/2000 


Kunzinger 


5,787,468 


A 


7/1998 


Clark 


6,067,611 


A 


* 


5/2000 


Carpenter et al. 


5,852,716 


A 


12/1998 


Hagersten 


6,081,874 


A 


♦ 


6/2000 


Carpenter et al. 


5,860,109 


A 


1/1999 


Hagersten et al. 


6,148,327 


A 




11/2000 


Whitebread et al. 


5,881,312 


A 


3/1999 


Dulong 


6,148,361 


A 


* 


U/2000 


Carpenter et al. 


5,884,046 


A 


3/1999 


An to no V 


6,161,189 


A 




12/2000 


Arimilli et al. 


5,887,138 


A 


3/1999 


Hagersten et al. 


6,181,262 


Bl 




1/2001 


Bennett 


5,895,484 


A 


* 4/1999 


Arimilli et al. 


6,219,741 


Bl 




4/2001 


Pawlowski et al. 


5,937,167 


A 


8/1999 


Arimilli et al. 


6,333,938 


Bl 




1^/2001 


Baker 


5,938,765 


A 


8/1999 


Dove et al. 


6,338,122 


Bl 


♦ 


1/2002 


Baumgartner et a 


5,958,011 


A 


9/1999 


Arimilli et aL 


6,343,347 


Bl 


* 


1/2002 


Arimilli et al. 


5,958,019 


A 


* 9/1999 


Hagersten et al. 


6,421,775 


Bl 




7/2002 


Brock et al. 


5,983,301 


A 


U/1999 


Baker et al. 










6,006,286 


A 


lZa999 


Baker ct al. 


* cited by examiner 







10/21/2003, EAST Version: 1.4.1 



1 



U.S. Patent Jul. 8, 2003 Sheet 1 of 6 US 6,591,307 Bl 




10/21/2003, EAST Version: 1.4.1 



U.S. Patent Jul. 8, 2003 Sheet 2 of 6 



US 6,591^07 Bl 



28 



1 



PROCESSOR 



•30 



PROCESSING 
LOGIC 



32 



I 



1 



MM 



CACHE 
HIERARCHY 



-36 



I 



i 



34 



COMMUNICATION 
LOGIC 



i=f 



Tig. 2 



j82 


^80 

^84 ^ ^86 


MASTER 
NODE ID 


TT 


ADDRESS 






Q'tg. 6A 


j92 




^ ^94 


SNOOPER 
NODE ID 


RESPONSE 






Tig. 6(B 


^102 




^100 

^ ^104 


DESTINATION 
NODE ID 


DATA 



'Fig. 6C 



10/21/2003, EAST Version: 1.4.1 



U.S. Patent 



Jul. 8, 2003 



Sheet 3 of 6 



US 6,591^07 Bl 




o 










TER 


< 

1— 


:nce 


MAS 


DA 


3nD3 









fen 




O X 
I- o 



10/21/2003, EAST Version: 1.4.1 



U.S. Patent jui. 8, 2003 sheet 4 of 6 



us 6,591^07 Bl 



CANCEL 
SIGNALS 
FROM 
LOCAL AGENTS 



14a 



14b 



B 



1 4k^ 
K — ^ 



1 6 



DATA 



SRs 
FROM 
LOCAL 
AGENTS 



CANCELLATION LOGIC 

1 — 



J — ^7 



FLOW CONTROL LOGIC 
62 



XT' 

64 



XT 

64 



DEST. 
ID 



64 




J 



■^6 6 
ENABLE 



CANCEL 



1 8 



60 



RESPONSE LOGIC 



PCR 



CR 



Tig. 4 



10/21/2003, EAST Version: 1.4.1 



U.S. Patent 



Jul. 8, 2003 Sheet 5 



US 6,591^07 Bl 




10/21/2003, EAST Version: 1.4.1 



U.S. Patent 



Jul. 8, 2003 



Sheet 6 of 6 



US 6,591^07 Bl 




> 






IMOR 




LU 
















o 


og 





10/21/2003, EAST Version: 1.4.1 



us 6,5' 

1 

MULTI-NODE DATA PROCESSING SYSTEM 
AND METHOD OF QUEUE MANAGEMENT 
IN WHICH A QUEUED OPERATION IS 

SPECULATIVELY CANCELLED IN 
RESPONSE TO A PARTIAL COMBINED 
RESPONSE 

CROSS-REFERENCE TO RELAl^ED 
APPLICAnONS 

The present application is related to the following 
co-pending applications, which are filed on even date here- 
with and incorporated herein by reference: 

(1) U.S. application Ser. No. 09/436,898; 

(2) U.S. application Ser. No. 09/436,899; 

(3) U.S. application Ser. No. 09/436,901; and 

(4) U.S. apphcation Ser. No. 09/436,900. 

BACKGROUND OF THE INVENTION 

1, Technical Field 

The present invention relates in general to data processing 
and, in particular, to communication within a data process- 
ing system. Still more particularly, the present invention 
relates to a multi-node data processing system and commu- 
nication protocol that support a partial combined response. 

2. Description of the Related Art 

It is well-knowD in the computer arts that greater com- 
puter system performance can be achieved by harnessing the 
processing power of multiple individual processors in tan- 
dem. Multi-processor (MP) computer systems can be 
designed with a number of different architectures, of which 
various ones may be better suited for particular applications 
depending upon the design point, performance 
requirements, and software environment of each application. 
Known architectures include, for example, the symmetric 
multiprocessor (SMP) and non-uniform memory access 
(NUMA) architecmres. Until the present invention, it has 
generally been assumed that greater scalability and hence 
greater performance is obtained by designing more hierar- 
chical computer systems, that is, computer systems having 
more layers of interconnects and fewer connections per 
interconnect. 

The present invention recognizes, however, that such 
hierarchical computer systems incur extremely high access 
latency for the percentage of data requests and other trans- 
. actions that must be communicated between processors 
coupled to different interconnects. For example, even for the 
relatively simple case of an 8-way SMP system in which 
four processors present in each of two nodes are coupled by 
an upper level bus and the two nodes are themselves coupled 
by a lower level bus, communication of a data request 
between processors in different nodes will incur bus aqui- 
sition and other transaction-related latency at each of three 
buses. Because such latencies are only compounded by 
increasing the depth of the interconnect hierarchy, the 
present invention recognizes that it would be desirable and 
advantageous to provide an improved data processing sys- 
tem architecture having reduced latency for transaction 
between physically remote processors. 

SUMMARY OF THE INVENTION 

The present invention realizes the above and other advan- 
tages in a multi-node data processing system having a 
non-hierarchical interconnect architecture. 

In accordance with the present invention, a data process- 
ing system includes an interconnect, a plurality of nodes 
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coupled to the interconnect that each include at least one 
agent, response logic within each node, and a queue. In 
response to snooping a transaction on the interconnect, each 
agent outputs a snoop response. In addition, the queue, 

5 which has an associated agent, allocates an entry to service 
the transaction. The response logic within each node accu- 
mulates a partial combined response of its node and any 
preceding node until a complete combined response for all 
of the plurality of nodes is obtained. However, prior to the 

10 associated agent receiving the complete combined response, 
the queue speculatively deallocates the entry if the partial 
combined response indicates that an agent other than the 
associated agent will service the transaction. 

All objects, features, and advantages of the present inven- 

15 tion will become apparent in the following detailed written 
description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features beheved characteristic of the invention 
are set forth in the appended claims. The invention itself 
however, as well as a preferred mode of use, further objects 
and advantages thereof, will best be understood by reference 
to the following detailed description of an illustrative 

25 embodiment when read in conjunction with the accompa- 
nying drawings, wherein: 

FIG. 1 depicts an illustrative embodiment of a multi-node 
data processing system having a non-hierarchical intercon- 
nect architecture in accordance with the present invention; 

30 FIG. 2 is a more detailed block diagram of a processor 
embodiment of an agent within the data processing system 
of FIG. 1; 

FIG. 3 is a more detailed block diagram of the commu- 
nication logic of the processor in FIG. 2; 

FIG. 4 is a more detailed block diagram of response and 
flow control logic within the data processing system shown 
in FIG. 1; 

FIG. 5A is a timing diagram of an exemplary address 
transaction in the data processing system illustrated in FIG. 

FIG. 5B is a timing diagram of an exemplary read-data 
transaction in the data processing system depicted in FIG. 1; 

FIG. 5C is a timing diagram of an exemplary write-data 
45 transaction in the data processing system illustrated in FIG. 
1; 

FIG. 6A depicts an exemplary format of a request trans- 
action transmitted via one of the address channels of the data 
processing system shown in FIG. 1; 

FIG. 6B illustrates an exemplary format of a partial 
combined response or combined response transmitted via 
one of the response channels of the data processing system 
of FIG- 1; 

55 FIG. 6C depicts an exemplary format of a data transaction 
transmitted via the data channel of the data processing 
system of FIG. 1; and 

FIG. 7 illustrates an alternative embodiment of a multi- 
node data processing system having a non-hierarchical inter- 

60 connect architecture in accordance with the present inven- 
tion. 

DETAILED DESCRIPTION OF ILLUSTRATIVE 
EMBODIMENT 

65 With reference now to the figures and in particular with 
reference to FIG. 1, there is depicted an illustrative embodi- 
ment of a multi-node data processing system 8 having a 
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non-hierarchical interconnect architecture in accordance that governs communication on interconnect 12, and a cache 

with the present invention. As shown, data processing sys- hierarchy 32 that provides local, low latency storage for 

tern 8 includes a number of nodes lOa-lOk, which are instructions and data. In addition to cache hierarchy 32, 

coupled together in a ring configuration by a segmented which may include, for example, level one (LI) and level 

interconnect 12 having one segment per node 10. 5 0^) caches, the local storage of each processor 28 may 

1^ .AA*- ' * f * til u J 1A include an associated off-chip level three (L3) cache 20 and 

In addition to a segment oi mterconnect 12, each node 10 , , , - t^t^ ^ \ \- j j 

P , , . * o - 1 J * *u * local memory 22, as shown in FIG. 1. Instructions and data 

ot data processing system 8 includes one or more aeents that r li j- . -i. . ^ i i -^^ i_ 

u 1 J . • * . J J • . J are preferably distributed among local memones 22 such 

are each coupled to interconnect 12 and are designated i , r , i-ni, ■ 

Ai\ A e T)nT>f -II AL * 1-u A that the aggregate of thc CO utents of all local mcmones 22 

AO-An for node lOa, BO-Bn for node 10b, etc. Each node ^ . i • » , • 

^- .ii m forms a shared main memory that is accessible to any 

10 also mcludes respective response and flow control logic ,1 - o n • r , , 1 

-,o*u ♦ * 1 *u « *• * .T'l agent within data processing system ». Here matter, the local 

18 that controls the now of transactions on interconnect 12 ^ „ \ . 1 • • j • . 

between its node 10 and a neighboring node 10 and gener- 22 containing a storage location associated with a 

. t . 1 /J- J u 1 \ *u * • J- . u particular address is said to be the home local memory for 

ates sideband signals (discussed below) that indicate how Tl , , . , . i , , , 

. .Lij JT-L u r that address, and the agent interposed between the home 

agents snooping a request should respond. The number of . , * . . , . - • i u .t. 

•*u- u J iA • / ui 1- J * i<; local memory and interconnect 12 is said to be the home 

agents withm each node 10 is preferably limited toani^ ri ii *t-T^-» it 

^ , ^ n ' J u agent for that address. As shown m FIG. 2, each home agent 

interconnect-dependent periormance-optimized number ° ... i i • i , 

/ o i£\ -.u * * 1 u • u- J u has a memorv map 36 accessible to cache hierarchy 32 and 

(e.g., 8 or 16), with greater system scale heme achieved by • i • . • i , 

AA- AA-/ ij^a/j. ' . o communication logic 34 that indicates only what memory 

adding additional nodes 10 to data processing system 8. j • , . j, . -.^ 

* r e> ^ addresses are contained in the attached local memory 22. 

Turning now more specifically to the interconnect archi- ^^j^^^^^^ 3 ^^^^^ iUustrated a more 

tecture of data processinpystem 8, interconnect 12 includes ^^^^^^^ ^ representation of an illustrative 

at least one (and in the illustrated embodiment a smgle) data embodiment of communication logic 34 of FIG. 2, As 

channel 16 and a plurality of non-blockmg address channels i„us,rated, communication logic 34 includes master cir- 

14fl-14* that are each associated with a respective one of -^^ ^„trol logic 40, a master address 

nodes lUa-lO* such that only agents within the associated sequencer 42 for sourcine request (address) transactions on 

node 10 can issue requests on an address channel 14. Each ^^^^^^ ^ ^^^^^ ^^^^^ ^ 

of address channels 14 and data channel 16 is segmented, as ^ourcing data transactions on data channel 16. Importantly, 

noted above, such that each node 10 contains a segment of . *u„* ^„„u „4.- i^ v « ui i • 

, , ' , , , ^ , , to ensure that each oi address channels 14 is non-blockmg, 

each address and data channel, and each address and data ^^^^^ ^^^^^^ sequencer 42 of each agent within a given 

channel segment is coupled to at least two ne.ghbonng connected to only the address channel 14 asso- 

segraetats of he same channel. As indicated by arrows, each ^j^^^^ ^ , ^^^^^ 

channel is also uni-direclional, meanmg that address and address sequencer 42 of each of agents A<X-An is comiected 

data transactions on mterconnect 12 are only propagated ,^ ^^^^^^^ ^^^^^^ ^^^j^^ ^^^^^^ sequencer 

between neighboring nodes 10 in the indicated direction. In 42 of each of agents BO-Bn is comiected to only address 

the Illustrated einbodmient. each segment ot an address channel 14b, and the master address sequencer 42 of each of 

channel 14 is unplemented as an address bus that conveys 32 ^ connected to only address channel 14^. To 

address bits m parallel, and each segment of data channel 16 f^^^j^ ^j^^^^^^ utilization of address channels 14 and ensure 

IS implemented as a data bus that conveys 16 d that^ local agents do not issue conflicting address 

parallel; however, it will be appreciated that mdividual transactions, some arbitration mechanism (e.g., round robin 

segments of mterconnect 12 can alternatively be irnple- or time slice) should be utiHzed to arbitrate between agents 

mented with switch-based or hybnd mterconnects and that within the same node 10. 

other embodiments of the present invention may implement „ . . . , . aa r 

different channel widths ^ By contrast, the master data sequencers 44 of all agents 

within data processing system 8 are connected to data 

In conjunction with interconnect 12, data processing channel 16. Although a large number of agents may be 

system 8 implements three sideband channels— a partial connected to data channel 16, in operation data channel 16 

combined response channel 24, a combined response chan- non-blocking since the types of data transactions that 

nel 26, and a cancel channel 27— to respectively commu- ^^ay be conveyed by data channel 16, which predominantly 

nicate partial combined responses, combined responses, and contain (1) modified data sourced from an agent other than 

a cancel (or stomp) signal. As utilized herein, a partial j^e home agent, (2) data sourced from the home agent, and 

combined response (or PGR) is defined as a cumulative (3) modified data written back to the home local memory 22, 

response to a request of all agents within fewer than all ^re statistically infrequent for applications in which the 

nodes, and a combmed response (or CR) is defined as a distribution of memory among local memories 22 and the 

cumulative response to a request by all agents in afl nodes. distribution of processes among the agents is optimized. Of 

As discussed further below, agents are able to determme by course, in implementations including only a single data 

reference to the PGR, GR, and cancel signal associated with ^h^nnel 16, some arbitration mechanism (e.g., round robin 

a request snooped on an address channel 14 whether or not ^ji^^) should be utilized to arbitrate between agents 

to service the request. within the same node 10 to ensure that local agents do not 

Referring now to FIG. 2, there is depicted a block diagram issue conflicting data transactions, 

of a processor 28 that can be utilized to implement any agent Communication logic 34 also includes snooper circuitry 
within data processing system 8. Although hereafter it is go comprising a snooper address and response sequencer 52 

assumed that each agent within data processing system 8 is coupled to each address channel 14 and to sideband response 

a processor, it should be understood that an agent can be any channels 24 and 26, a snooper data sequencer 54 coupled to 

device capable of supporting the communication protocol data channel 16, and snooper control logic 50 connected to 

described herein. snooper address and response sequencer 52 and to snooper 

As shown in FIG. 2, processor 28 includes processing 65 data sequencer 54. In response to receipt of a request 

logic 30 for processing instructions and data, communica- transaction by snooper address and response sequencer 52 or 

tion logic 34, which implements a communication protocol a data transaction by snooper data sequencer 54, the trans- 
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action is passed to snooper control logic 50. Snooper control 
logic 50 processes the transaction in accordance with the 
implemented communication protocol and, if a request 
transaction, provides a snoop response and possibly a cancel 
signal to its node's response and flow control logic 18. 5 
Depending upon the type of transaction received, snooper 
control logic 50 may initiate an update to a directory or data 
array of cache hierarchy 32, a write to the local memory 22, 
or some other action. Snooper control logic 50 performs 
such processing of request and data transactions from a set jq 
of request queues 56 and data queues 58, respectively. 

Referring now to FIG. 4, there is depicted a more detailed 
block diagram of an exemplary embodiment of response and 
flow control logic 18. As illustrated, response and flow 
control logic 18 includes response logic 60, which combines 15 
snoop responses from local agents and possibly a PGR from 
a neighboring node 10 to produce a cumulative PGR indica- 
tive of the partial combined response for all nodes that have 
received the associated transaction. For example, if agent AO 
of node 10a masters a request on address channel I4a, 
agents Al-An provide snoop responses that are combined 
by response and flow control logic IHa to produce a PCR^ 
that is provided on PGR bus 24. When the request is snooped 
by agents BO-Bn, agents BO-Bn similarly provide snoop 
responses, which are combined with PGR^ of node 10^? by 25 
response and flow control logic 18^ to produce a cumulative 
PGR^^j,. ITiis process continues until a complete combined; 
response is obtained (i.e. , PCR^^^+ . +a^CR). Once the CR 
is obtained, the GR is made visible to all nodes via GR 
channel 26. Depending upon the desired implementation, the 30 
CR for a request can be provided on GR channel 26 by the 
response and flow control logic 18 of either the last node 10 
receiving the request or the master node 10 containing the 
master agent. It is presently preferable, both in terms of 
complexity and resource utilization, for the response logic 35 
60 of the master node 10 to provide the CR for a request, 
thus permitting agents within the master node 10 to receive 
the CR prior to agents within any other node 10. This 
permits the master agent, for example, to retire queues in 
master control logic 40 which are allocated to the request as 40 
soon as possible. 

As is further iflustrated in FIG. 4, response and flow 
control logic 18 also contains flow control logic 62, which 
includes address latches 64 connecting neighboring seg- 
ments of each of address channels 14^-14/:. Address latches 45 
64 are enabled by an enable signal 66, which can be derived 
from an interconnect clock, for example. Flow control logic 
62 also includes a data latch 72 that connects neighboring 
segments of data channel 16. As indicated by enable logic 
including XOR gate 68 and AND gate 70, data latch 72 50 
operates to output a data transaction to the neighboring 
segment of data channel 16 only if a the data transaction's 
destination identifier (ID) does not match the unique node 
ID of the current node 10 (i.e., if the data transaction 
specifies an intended recipient node 10 other than the current 55 
node 10). Thus, data transactions communicated on data 
channel 16, which can contain either read data or write data, 
propagate from the source node to the destination node 
(which may be the same node), utilizing only the segments 
of data channel 16 within these nodes and any intervening 60 
node(s) 10. 

Each response and flow control logic 18 further includes 
cancellation logic 74, which is implemented as an OR gate 
76 in the depicted embodiment. Cancellation logic 74 has an 
output coupled to cancel channel 27 and an input coupled to 65 
the cancel signal output of the snooper control logic 50 of 
each agent within the local node 10. The snooper control 



logic 50 of an agent asserts its cancel signal if the snooper 
control logic 50 determines, prior to receiving the PGR from 
another node 10, that a request issued by an agent within the 
local node 10 will be serviced by an agent within the local 
node 10. Depending on the desired implementation, the 
cancel signal can be asserted by either or both of the master 
agent that issued the request and the snooping agent that will 
service the request. In response to the assertion of the cancel 
signal of any agent within the node 10 containing the master 
agent, cancellation logic 74 assets a cancel signal on cancel 
channel 27, which instructs the snooper control logic 50 of 
agents in each other node 10 to ignore the request. Thus, the 
assertion of a cancel signal improves the queue utilization of 
agents in remote nodes 10 by preventing the unnecessary 
allocation of request and data queues 56 and 58. 

With reference now to FIG. 5A, a timing diagram of an 
exemplary request transaction in the data processing system 
of FIG. 1 is depicted. The request transaction is initiated by 
a master agent, for example, agent AO of node lOa, master- 
ing a read or write request transaction on the address channel 
14 associated with its node, in this case address channel X4a. 
As shown in FIG. 6 A, the request transaction 80 may 
contain, for example, a master node ID field 82 indicating 
the node ID of the master agent, a transaction type (TT) field 
84 indicating whether the request transaction is a read (e.g., 
read-only or read-with-intent-to-modify) or write request, 
and a request address field 86 specifying the request address. 
The request transaction propagates sequentially from node 
10a to node 10b and eventually to node 10k via address 
channel 14a. Of course, whfle the request transaction is 
propagating through other nodes 10, other request transac- 
tions may be made concurrently on address channel XOa or 
address channels 14b-14k. 

As discussed above and as shown in FIG. 5A, after the 
snooper address and response sequencer 52 of each agent 
snoops the request transaction on address channel 14a, the 
request transaction is forwarded to snooper control logic 50, 
which provides to the local response and flow control logic 
18 an appropriate snoop response indicating whether that 
agent can service (or participate in servicing) the request. 
Possible snoop responses are listed in Table I below in order 
of descending priority. 

TABLE I 



Snoop 




response 


Meaning 


Retry 


Retry transaction 


Modified 


Agent holds requested line 


intervention 


in a modified state in cache 




from which data can be 




sourced 


Shared 


Agent holds requested line 


intervention 


in a shared state from which 




data can be sourced 


Shared 


Agent holds requested line 




in a shared state in cache 


Home 


Agent is home agent of 




request address 


Null 


Agent docs not hold the 




requested line in cache and 




is not the home agent 



The snoop responses of only agents AO-Ak are then 
combined by response and flow control logic 18a into a 
PCR^ output on PGR channel 24. As indicated in FIG. 6B, 
a response 90, which may be either a PGR or a CR, includes 
at least a response field 94 indicating the highest priority 
snoop response yet received and a snooper node ID field 92 
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indicating the node ID of the agent providing the highest 104 and a destination node ID field 102 specifying the node 

priority snoop response yet received. ID of the node 10 containing the intended recipient agent (in 
If during a determination of the appropriate snoop ^^^e node lOa). For read-data requests such as that 

response, the snooper control logic 50 of an agent within illustrated in FIG 5B, the dcstmation node ID 
node 10a determines that it is likely to have the highest 5 the^^ur^^^^^^ 

priority snoop response of all agents within data processing transaction. . . 

system 8, for example. Modified Intetvention for a read ^he data transaction sourced by agent BO is tlien propa- 

request or Hotne for a write request, the agent within node f}^'^. ^"^ data channel 16 through each node 10 unt.l node 

_ . , , t . .u 1 1 11 1 * Iw? IS reached. As mdicated m FIG. 5B, as response and 

lOfl asserts Its cancel signal to the local cancellation logic „ .n iorjiAj \c j.uj* 
1*1 1 u 1 IT A in flowconirolIogiclSflof node 10a does not forward the data 
74, which outputs a cancel signal on cancel channel 27. As lo . j mz. • *u ^ a it^ 

i_ ' r-T^ A .t- 1 • 1 • f ui J transaction to node 10b since the destination node ID 

shown m FIG. 5 A, the cancel signal is preferably asserted on . ■ j • ^ u ^^-t r j . . * u 

1 , 1 -i*, • * n/-n T-u u * -.u- »u Contained m field 102 of the data transaction matchcs the 
cancel channel 27 prior to PGR. -Thus, each agent withm the j TT>k f ^ ia c a. ca c . aa 

J . , . ^ • .1 /• node ID of node lOfl. Snooper data sequencer 54 of agent AO 

nodes that subsequently receive the request transaction (i.e., „ *u j * * e u i * 

J iAf\ 1 . • finally snoops the data transaction from data channel 16 to 

nodes 10c>-10a:) can cancel the request queue 56 that is i * *u j . * ^ u t j . 

. J ..y * 1 1 • cA * ,1 .u 1^ complete the data transaction. The cache line of data may 

allocated within snooper control logic 50 to provide the f u i'* j/ i- j/ 

- . *^ . J .1. thereafter be stored in cache hierarchy 32 and/or supplied to 
snoop response for the request, and no other snoop responses • i • m ^ . aa 

J r»i^n /-'r* 11 u . j r * proccssing logic 30 of agent AO. 

and no PGR or GR will be generated for the request - . %^ ■ ^ - • 

♦,„„^„„f:^ Referring now to MG. 5C, a wnte-data transaction begins 

transaction. , , , , . ^ 

. , . , , ^ « when agent AO, the agent that mastered the write request. 

Assuming that no agent within the master node 10a ^^^^-^^^ ^^^^ ^-^ ^^^^^^^ 26. 

asserts iLs cancel signal to indicate that the request transac- j^, ^tantly, the CR contains the node ID of the home agent 
tion will be serviced locally, agents BO-Bn within neigh- ^^^^^^ ^^^^ ^^^^ ^^^^ 

bormg node 10b will provide snoop responses which are ^ ^^^^ ^^^^ 93, as described above. Agent AO 

combiDed together with PGR^ by response and flow control ^^^^ ^^^^ ^ destination node ID field 102 of a 

logic 18j> to produce PCR^,^. The process of accumulating ^rite-data transaction and sources the data transaction on 

PCRs Oiereafter contmues until response and flow control ^^^^ ^^^^^^^ ^ ^^^^^^^^ pj^. 5G, response and flow 

logic 18^ produces PGR^,^, ^ . . which a>ntains the node ^^^^^^^ j ^^^^ ^^^^ ^^^^^ ^ ^^^^ 

ID of the agent that wdl participate in servicmg the request ^.^^s.^tion to any subsequent neighboring node 10 since the 

U^nsactionaud the snoop response of destination node ID contained in field 102 of the data 

Tiius, for a read request, the final PGR contains the node ID transaction matches the node ID of node 10b. Snooper data 

of the agent that will source the requested cache Une of data, ^ j^^, 54 ent BO finally snoops the data transaction 
and for a write request the final PGR specifies the node ID ^ata channel 16 to complete the data transaction. The 

of the home agent for the requested cache fine of data. When ^^^^ thereafter be written into local memory 22 of agent 
PCR^.B. . . .A. which IS equivalent to the CR, is received ^-^^ reference now to FIG. 7, there is iUustrated an 

by response logic 60 withm node 10^^,respon^ alternative embodiment of a multi-node data processing 

node 10a provides the GR to all agents on CR channel 26. ^^^^^^ ^^^^^ ^ non-hierarchical interconnect architecture 

As illustrated in FIGS. 1 and 3, each agent within data in accordance with the present invention. As shown, data 

processing system 8 is coupled to and snoops PGRs on PGR processing system 108, like data processing system 8 of 

channel 24. In contrast to conventional multi-processor piG. 1, includes a number of nodes lOa-lOJt, which are 

systems in which processors only receive GRs, the present coupled together in a ring configuration by a segmented 

invention makes PGRs visible to agents to permit agents that interconnect 112 having one segment per node 10. Intercon- 

are not likely to service a snooped request to speculatively nect 112 includes at least one (and in the illustrated embodi- 

cancel queues (e.g., request and/or data queues 56 and 58) ^ent a single) data channel 16 and a plurality of non- 

aUocated to the request prior to receipt of the GR for the blocking address channels Ua-Un that are each associated 

request, llius, if an agent provides a lower priority snoop ^jth a particular agent (or connection for an agent) in each 

response to a request than is indicated in the PGR, the agent one of nodes lOfl-lOJt, such that only agents with the 

can safely cancel any queues allocated to the request prior to corresponding numerical designation can issue requests on 

receiving the GR. ITiis early deallocation of queues advan- an address channel 14. That is, although each agent snoops 

tageously increases the eft'ective size of each agent's queues. all address channels 14, only agents AO, BO, . . . , KO can 

With reference now to FIGS. 5B and 5G, there are 50 issue requests on address channel 14^, and only agents An, 

respectively illustrated timing diagrams of an exemplary Bn, . . . , Kn can issue requests on address charmel 14/i. Thus, 

read-data transaction and an exemplary write-data transac- the principal difference between the embodiments depicted 

tion in data processing system 8 of FIG. 1. Each of the in nGS. 1 and 7 is the centralization of master agents for a 

illustrated data transactions follows a request (address) particular address channel 14 within a single node in FIG. 1 
transaction such as that illustrated in FIG. 5A and assumes 55 versus the one-per-node distribution of master agents for a 

agent BO of node 10b participates with agent AO of node lOa particular address channel 14 among nodes 10 in FIG. 7. 
in the data transaction. One advantage of the interconnect architecture illustrated 

Referring first to the read-data transaction shown in FIG. in FIG. 7 is that master agents need not arbitrate for their 

5B, when the CR output on CR channel 26 by response and associated address channels 14. If the snooper control logic 
flow control logic 18fl is received by agent BO, agent BO, 60 50of an agent detects that no address transaction is currently 

which responded to the request transaction with a Modified being received on the associated address channel, the master 

Intervention, Shared Intervention or Home snoop response control logic 40 can source an address transaction on its 

indicating that agent BO could source the requested data, address channel 14 without the possibility of collision with 

sources a data transaction on data channel 16 containing a another address transaction. 

cache line of data associated with the request address. As 65 As has been described, the present invention provides an 

illustrated in FIG. 6C, in a preferred embodiment a read-data improved non-hierarchical interconnect for a multi-node 

or write-data transaction 100 includes at least a data field data processing system. The interconnect architecture intro- 
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duccd by ihe present invention has an associated commu- 
nication protocol having a distributed combined response 
mechanism that accumulates per-node partial combined 
responses until a complete combined response can be 
obtained and provided to all nodes. For both read and write 5 
communication scenarios, the combined response, in addi- 
tion to conveying the snoop response of a servicing agent, 
indicates the node ID of the node containing the servicing 
agent. In this manner, read and write data can be directed 
from a source agent to a target agent without being propa- 
gated to other nodes unnecessarily. The present invention 
also introduces two mechanisms to facilitate better commu- 
nication queue management: a cancel mechanism to enable 
remote nodes to ignore a request that can be serviced locally 
and a speculative cancellation mechanism that enables an 15 
agent to speculatively cancel a queue allocated to a request 
in response to the partial combined response for the request. 

While the invention has been particularly shown and 
described with reference to a preferred embodiment, it will 
be understood by those skilled in the art that various changes 
in form and detail may be made therein without departing 
from the spirit and scope of the invention. 

What is claimed is: 

1. A data processing system, comprising: 25 
an interconnect; 

a plurality of nodes coupled to said interconnect, wherein 
each of said plurality of nodes includes at least one 
agent and at least one of said plurality of nodes includes 
multiple agents, wherein each agent in all of said 
plurality of nodes snoops a transaction transmitted on 
said interconnect and outputs a snoop response in 
response to snooping the transaction; 

response logic within each node that accumulates a partial 35 
combined response to said transaction, said partial 
combined response representing a combination of the 
snoop response of each agent within its node and a 
partial combined response of any preceding node, 
wherein the response logic within anode among said 40 
plurality of nodes accumulates a partial combined 
response of one or more preceding nodes with the 
snoop response of each of one or more agents within its 
node to obtain a complete combined response to the 
transaction of all agents within all of said plurality of 45 
nodes, and wherein said response logic of the node 
provides said complete combined response to all of said 
plurality of nodes; and 

a queue that, responsive to an associated agent snooping 
the transaction, allocates an entry to service said 50 
transaction, wherein said queue speculatively deallo- 
cates said entry prior to receipt of said complete 
combined response by said associated agent in response 
to a partial combined response indicating that an agent 
other said associated agent will service said transaction . 55 

2. The data processing system of claim 1, and further 
comprising a memory controller coupled to said associated 
agent, said memory controller containing said queue. 

3. The data processing system of claim 2, wherein said 
transaction is a read request and said partial combined 60 
response indicates that a preceding node contains an agent 
having a higher sourcing priority. 

4. The data processing system of claim 1, said plurality of 
nodes including a mastering node containing a mastering 
agent that issued said transaction, wherein response logic 65 
within said mastering node provides said complete com- 
bined response to all of said pluraUty of nodes. 
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5. The data processing system of claim 1, wherein said 
interconnect comprises: 

a plurality of address channels, wherein each agent in all 
of said plurality of nodes is coupled to all of said 
plurality of address channels, and wherein each agent 
can only master transactions on an address channel 
associated with its node and snoops transactions on all 
of said plurality of address channels; and 
at least one data channel. 

6. The data processing system of claim 1, wherein said 
pluraHty of nodes includes at least three nodes, and wherein 
said interconnect includes an address portion that couples 
said pliu-ality of nodes in a ring topology. 

7. The data processing system of claim 6, wherein: 
said plurality of nodes sequentially receive said transac- 
tion from said interconnect; and 

said response logic produces said partial combined 
response to said transaction for its node based upon 
snoop responses of agents in its node and any node 
previously receiving the transaction on said intercon- 
nect, 

8. A method of communication in a data processing 
system including an interconnect coupling a plurality of 
nodes that each include at least one agent and response logic, 
wherein at least one of said plurality of nodes includes a 
plurality of agents, said method comprising: 

in response to snooping a transaction transmitted on said 
interconnect, outputting, from each agent, a snoop 
response and, at a queue having an associated agent, 
allocating an entry to service said transaction; 
utilizing the response logic of each node, accumulating a 
partial combined response of the node and any preced- 
ing node until a complete combined response to said 
transaction for all of agents in all of said plurality of 
nodes is obtained, said partial combined response of 
each node representing a combination of the snoop 
response of each agent within that node and a partial 
combined response of any preceding node; 
providing said complete combined response to all of said 

plurahty of nodes; and 
prior to receipt of said complete combined response by 
said associated agent, speculatively deallocating said 
entry in response to a partial combined response indi- 
cating that an agent other said associated agent will 
service said transaction. 

9. The method of claim 8, said data processing system 
further comprising a memory controller coupled to said 
associated agent, said memory controller containing said 
queue, wherein said step of allocating an entry to service 
said transaction comprises allocating an entry within said 
queue contained in said memory controller. 

10. The method of claim 9, wherein said transaction is a 
read request, and wherein speculatively deallocating said 
entry comprises: 

speculative deallocating said entry in response to said 
partial combined response indicating that a preceding 
node contains an agent having a higher sourcing pri- 
ority. 

11. The method of claim 8, said plurality of nodes 
including a mastering node containing a mastering agent that 
issued said transaction, wherein providing said complete 
combined response comprises providing said complete com- 
bined response to all of said plurality of nodes from response 
logic within said mastering node. 
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12. The method of claim 8, wherein said interconnect 
comprises a plurality of address channels and at least one 
data channel, said method further comprising: 

coupling each agent in all of said plurality of nodes to all 
of said plurality of address channels and to said at least 
one data channel, such that each agent can only master 
transactions on an address channel associated with its 
node and snoops transactions on all of said plurality of 
address channels. 

13. The method of claim 8, wherein said plurality of nodes 
includes at least three nodes, and wherein said interconnect 
includes an address portion, said method further comprising 
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coupling said plurality of nodes in a ring topology with said 
address portion of said interconnect. 
14. ITie method of claim 13, wherein: 
said method further comprises said plurality of nodes 
sequentially receiving said transaction from said inter- 
connect; and 

said accumulating step comprises response logic within 
each node producing a partial combined response to 
said transaction for its node based upon snoop 
responses of agents in its node and any node previously 
receiving the transaction on said interconnect. 
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